Learning an explainable classifier often results in low accuracy model or ends up with a huge rule set, while learning a deep model is usually more capable of handling noisy data at scale, but with the cost of hard to explain the result and weak at generalization. To mitigate this gap, we propose an end-to-end deep explainable learning approach that combines the advantage of deep model in noise handling and expert rule-based interpretability. Specifically, we propose to learn a deep data assessing model which models the data as a graph to represent the correlations among different observations, whose output will be used to extract key data features. The key features are then fed into a rule network constructed following predefined noisy expert rules with trainable parameters. As these models are correlated, we propose an end-to-end training framework, utilizing the rule classification loss to optimize the rule learning model and data assessing model at the same time. As the rule-based computation is none-differentiable, we propose a gradient linking search module to carry the gradient information from the rule learning model to the data assessing model. The proposed method is tested in an industry production system, showing comparable prediction accuracy, much higher generalization stability and better interpretability when compared with a decent deep ensemble baseline, and shows much better fitting power than pure rule-based approach.
translated by 谷歌翻译
在本文中,我们考虑发现非凸锥优化的近似二阶固定点(SOSP),该点在仿射子空间和凸锥的交点上最小化了两倍的可微分函数。特别是,我们提出了一个基于牛顿 - 偶联的梯度(牛顿-CG)的障碍方法,用于查找$(\ epsilon,\ sqrt {\ epsilon})$ - 此问题的SOSP。我们的方法不仅可以实现,而且还达到了$ {\ cal o}(\ epsilon^{ - 3/2})$的迭代复杂性,它匹配找到$的二阶方法的最著名迭代复杂性(以找到$(\ epsilon,\ sqrt {\ epsilon})$ - 无约束的非convex优化的sosp。$ \ widetilde {\ cal o}的操作复杂性(\ epsilon^{ - 3/2} \ min \ {也是为我们的方法建立的。
translated by 谷歌翻译
图形神经网络(GNNS)将深度神经网络(DNN)的成功扩展到非欧几里德图数据,实现了各种任务的接地性能,例如节点分类和图形属性预测。尽管如此,现有系统效率低,培训数十亿节点和GPU的节点和边缘训练大图。主要瓶颈是准备GPU数据的过程 - 子图采样和特征检索。本文提出了一个分布式GNN培训系统的BGL,旨在解决一些关键思想的瓶颈。首先,我们提出了一种动态缓存引擎,以最小化特征检索流量。通过协同设计缓存政策和抽样顺序,我们发现低开销和高缓存命中率的精美斑点。其次,我们改善了曲线图分区算法,以减少子图采样期间的交叉分区通信。最后,仔细资源隔离减少了不同数据预处理阶段之间的争用。关于各种GNN模型和大图数据集的广泛实验表明,BGL平均明显优于现有的GNN训练系统20.68倍。
translated by 谷歌翻译
联合学习允许多个参与者在不公开数据隐私的情况下协作培训高效模型。但是,这种分布式的机器学习培训方法容易受到拜占庭客户的攻击,拜占庭客户通过修改模型或上传假梯度来干扰全球模型的训练。在本文中,我们提出了一种基于联邦学习(CMFL)的新型无服务器联合学习框架委员会机制,该机制可以确保算法具有融合保证的鲁棒性。在CMFL中,设立了一个委员会系统,以筛选上载已上传的本地梯度。 The committee system selects the local gradients rated by the elected members for the aggregation procedure through the selection strategy, and replaces the committee member through the election strategy.基于模型性能和防御的不同考虑,设计了两种相反的选择策略是为了精确和鲁棒性。广泛的实验表明,与典型的联邦学习相比,与传统的稳健性相比,CMFL的融合和更高的准确性比传统的稳健性,以分散的方法的方式获得了传统的耐受性算法。此外,我们理论上分析并证明了在不同的选举和选择策略下CMFL的收敛性,这与实验结果一致。
translated by 谷歌翻译
Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP that can simultaneously attain optimality and computationally efficiency goals, and it has been recently used in experiments with Bernoulli and Gaussian rewards. For the first time, we present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards. We report its performance in simulated 2- armed and 3-armed experiments. Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning (e.g. statistical power) but substantially better in earning (e.g. direct benefits). This illustrates the potential that designs using a GI approach to allocate participants have to improve participant benefits, increase efficiencies, and reduce experimental costs in adaptive multi-armed experiments with exponential rewards.
translated by 谷歌翻译
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译
Text clustering and topic extraction are two important tasks in text mining. Usually, these two tasks are performed separately. For topic extraction to facilitate clustering, we can first project texts into a topic space and then perform a clustering algorithm to obtain clusters. To promote topic extraction by clustering, we can first obtain clusters with a clustering algorithm and then extract cluster-specific topics. However, this naive strategy ignores the fact that text clustering and topic extraction are strongly correlated and follow a chicken-and-egg relationship. Performing them separately fails to make them mutually benefit each other to achieve the best overall performance. In this paper, we propose an unsupervised text clustering and topic extraction framework (ClusTop) which integrates text clustering and topic extraction into a unified framework and can achieve high-quality clustering result and extract topics from each cluster simultaneously. Our framework includes four components: enhanced language model training, dimensionality reduction, clustering and topic extraction, where the enhanced language model can be viewed as a bridge between clustering and topic extraction. On one hand, it provides text embeddings with a strong cluster structure which facilitates effective text clustering; on the other hand, it pays high attention on the topic related words for topic extraction because of its self-attention architecture. Moreover, the training of enhanced language model is unsupervised. Experiments on two datasets demonstrate the effectiveness of our framework and provide benchmarks for different model combinations in this framework.
translated by 谷歌翻译
This paper illustrates the technologies of user next intent prediction with a concept knowledge graph. The system has been deployed on the Web at Alipay, serving more than 100 million daily active users. Specifically, we propose AlipayKG to explicitly characterize user intent, which is an offline concept knowledge graph in the Life-Service domain modeling the historical behaviors of users, the rich content interacted by users and the relations between them. We further introduce a Transformer-based model which integrates expert rules from the knowledge graph to infer the online user's next intent. Experimental results demonstrate that the proposed system can effectively enhance the performance of the downstream tasks while retaining explainability.
translated by 谷歌翻译
Capturing feature information effectively is of great importance in vision tasks. With the development of convolutional neural networks (CNNs), concepts like residual connection and multiple scales promote continual performance gains on diverse deep learning vision tasks. However, the existing methods do not organically combined advantages of these valid ideas. In this paper, we propose a novel CNN architecture called GoogLe2Net, it consists of residual feature-reutilization inceptions (ResFRI) or split residual feature-reutilization inceptions (Split-ResFRI) which create transverse passages between adjacent groups of convolutional layers to enable features flow to latter processing branches and possess residual connections to better process information. Our GoogLe2Net is able to reutilize information captured by foregoing groups of convolutional layers and express multi-scale features at a fine-grained level, which improves performances in image classification. And the inception we proposed could be embedded into inception-like networks directly without any migration costs. Moreover, in experiments based on popular vision datasets, such as CIFAR10 (97.94%), CIFAR100 (85.91%) and Tiny Imagenet (70.54%), we obtain better results on image classification task compared with other modern models.
translated by 谷歌翻译
Despite some successful applications of goal-driven navigation, existing deep reinforcement learning-based approaches notoriously suffers from poor data efficiency issue. One of the reasons is that the goal information is decoupled from the perception module and directly introduced as a condition of decision-making, resulting in the goal-irrelevant features of the scene representation playing an adversary role during the learning process. In light of this, we present a novel Goal-guided Transformer-enabled reinforcement learning (GTRL) approach by considering the physical goal states as an input of the scene encoder for guiding the scene representation to couple with the goal information and realizing efficient autonomous navigation. More specifically, we propose a novel variant of the Vision Transformer as the backbone of the perception system, namely Goal-guided Transformer (GoT), and pre-train it with expert priors to boost the data efficiency. Subsequently, a reinforcement learning algorithm is instantiated for the decision-making system, taking the goal-oriented scene representation from the GoT as the input and generating decision commands. As a result, our approach motivates the scene representation to concentrate mainly on goal-relevant features, which substantially enhances the data efficiency of the DRL learning process, leading to superior navigation performance. Both simulation and real-world experimental results manifest the superiority of our approach in terms of data efficiency, performance, robustness, and sim-to-real generalization, compared with other state-of-art baselines. Demonstration videos are available at \colorb{https://youtu.be/93LGlGvaN0c.
translated by 谷歌翻译